Skip to main content

First Rule of Machine Learning - Start without Machine Learning

  • Metadata
    • #machine-learning #data-science
  • Rule 1 of [[Google 43 rules of machine learning]] is "Don't be afraid to launch a product without machine learning"
    • Because the amount of data required for a good machine learning algo
  • So if ML gives you 100% improvement, a heuristic will often give you 50%
  • Also it is good to start off without ML to get familiar with the problem and underlying data
  • Things to start with
    • Correlations with scatter plots (if data is numerical), box plots (if data is categorical)
    • Recommendations based on previous period (Alibaba's swing algorithm)
    • Classification based on regex
    • Spam identification with review timing, similarity
    • Outlier identification with interquartile range
    • Forecasting with moving average
  • Heuristics can also help with labelling data when starting from scratch (weak supervision)

When you have a problem, build two solutions - a deep Bayesian transformer running on multi-cloud Kubernetes and a SQL query built on a stack of egregiously oversimplifying assumptions. Put one on your resume, the other in production. Everyone goes home happy.